Annals of Clinical and Analytical Medicine 


Original Research 


Protein kinase-coding genes as novel diagnostic and prognostic biomarkers 
for pancreatic ductal adenocarcinoma 


Protein kinase-coding genes in PDAC 


Sevcan Atay 
Department of Medical Biochemistry, Ege University Faculty of Medicine, Bornova, Izmir, Turkey 


Abstract 

Aim: This study aims to identify the prognostic and diagnostic significance of protein kinase-coding genes in pancreatic ductal adenocarcinoma (PDAC), prod- 
ucts of which constitute one of the main classes of drug targets in cancer treatment. 

Material and Methods: Whole-genome gene expression data from seven PDAC cohorts (GSE62452, GSE15471, GSE62165, GSE18670, GSE19280, GSE41368, 
GSE71989) were included in the integrative transcriptomic analysis (n tumor=252, ncontrol=131). The differentially expressed genes in PDAC compared to 
controls were identified using random- effects model and were further validated in TCGA (The Cancer Genome Atlas) combined GTEx (Genotype-Tissue Expres- 
sion) cohort (n tumor=179, ncontrol=171). The prognostic significance of the identified genes was then evaluated by integrating survival and transcriptome 
data of over 530 (n=530-1302) patients using OSpaad. 

Results: The integrative transcriptomic analysis revealed a total of seven down-regulated and 33 up-regulated protein kinase-coding genes in PDAC (adjusted 
p-values0,05, -2<z-value<2). The validation analysis using TCGA combined GTEx data confirmed 80% (n=32) of the identified differentially expressed genes 
in PDAC (p<0,01, and fold change=2). Amongst, the elevated mRNA expressions of 9 genes (PTK2, TAOK1, CSNK1A1, EIF2AK2, WNK1, CDK12, CDK6, GSK3B, 
and MAP4K4) were found to be significantly correlated with worse overall survival of patients with PDAC (Logrank p<0,05, HR>1). Overexpression of SYK and 
PRKACB were correlated with better overall survival (Logrank p<0,05, HR<1). 

Discussion: The results of this study suggest that mRNA expression of the identified eleven protein kinase-coding genes can be used as both prognostic and 
diagnostic biomarkers for further clinical validation. 
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Introduction 

Protein kinases are proteins that belong to the transferases 
class of enzymes (EC:2). The human genome has been shown 
to encode for 518 protein kinases [1]. These enzymes regulate 
the biological activity of various proteins generally in response 
to an intracellular or external stimulus by transferring a gamma 
phosphate group of ATP to serine, threonine, tyrosine, arginine 
or histidine residues of proteins [2]. Protein kinases are 
classified by the amino acids they phosphorylate. The two main 
classes of protein kinases are tyrosine kinases (EC:2.7.10) and 
serine/threonine kinases (EC:2.7.11). Protein phosphorylation 
is one of the most frequent and important post-translational 
modifications and mechanisms of regulation of crucial cellular 
processes such as proliferation, signal transduction, apoptosis, 
among others [3-5]. It has been shown that protein kinases are 
frequently altered, especially in human malignancies [6]. These 
alterations include genetic aberrations including deregulated 
expression, amplification, translocation, 
mutation, and epigenetic modifications [6]. Therefore, due to 
these alterations’ high oncogenic potential, protein kinases 
represent one of the most targeted groups of enzymes in the 
treatment of diverse types of cancer. 

Pancreatic cancer is one of the deadliest malignancies and the 
fourth common cause of death from cancer [7]. The overall 
5-year survival rate for patients with pancreatic cancer remains 
less than 8%, and a one-year survival is around 18% when all 
stages are combined [7,8]. Pancreatic ductal adenocarcinoma 
(PDAC) is the most common form of pancreatic cancer 
constituting more than 90% of cases with pancreatic cancer 
[9]. Delayed diagnosis due to the absence of screening methods, 
and frequent recurrence owing to highly metastatic and chemo- 
resistant nature of pancreatic cancer are the major effectors of 
its dismal prognosis. Thus, the identification of diagnostic or 
prognostic biomarkers and targets for effective therapies is an 
urgent need in the treatment of the disease. 

Using an integrative transcriptomic analysis approach, this 
study aims to identify differentially expressed protein kinase- 
coding genes in pancreatic ductal adenocarcinoma and their 
potential to be diagnostic and prognostic biomarkers for PDAC. 


chromosomal 


Material and Methods 

Selection of Gene Expression Omnibus 
Assessment of Data Quality 

Gene Expression Omnibus (GEO) was searched for datasets 
including mRNA 
pancreatic ductal adenocarcinoma tissues and healthy or 
adjacent-to-tumor pancreatic tissues. Eligible datasets were 
subjected to data quality assessment using ExAtlas software 
[10]. As indicated in the manual of ExAtlas, the correlation of 
expression of housekeeping genes in the range from 0,5 to 0,95 
and the level of standard deviation from the global mean for 
each set of genes grouped by the average expression is less 
than 0,3 were considered good quality. Samples of low quality 
were not included in the analysis. 

The Integrative Transcriptomic Analysis 

The integrative gene expression analysis was performed by 
combining gene expression datasets from different studies and 


Datasets and 


expression data from human-derived 


platforms using ImaGeo Software [11]. Effect size was selected 
as a meta-analysis metric and a random effects model was 
used for effect size estimation. Samples with more than 1% 
of missing values were ignored during the analysis. Z-values 
were used to measure differential expression between PDAC 
and control tissues. The adjusted p-value less than 0,05 and z- 
value greater than 2 or less than -2 were considered significant. 
Determination of the differentially expressed protein kinase- 
coding genes 

The enzyme classes to which the identified differentially 
expressed genes are assigned were searched manually in KEGG 
(Kyoto Encyclopedia of Genes and Genomes). The differentially 
expressed genes encoding protein belong to 
‘Transferring phosphorus-containing groups’ (kinases, EC:2.7.) 
were selected for further analyses. 

External Validation of the Identified Differentially Expressed 
Genes in PDAC 

The external validation of the identified differential expression 
of protein kinase-coding genes in PDAC was generated using 
GEPIA Database by comparing transcriptomic data from the 
TCGA PAAD (n=179), and the TCGA normal and GTEx data 
(n=171) [12]. The method for differential analysis was one-way 
analysis of variance (ANOVA), using disease state (Tumor or 
Normal) as a variable for calculating differential expression. 
P<0,01 and fold change>2 was accepted as a statistically 
significant difference in gene expression. 

Prognostic Significance of the identified DEGs in PDAC 

The possible association between the mRNA expression level 
of the identified differentially expressed protein kinase- 
coding genes in PDAC and overall survival was evaluated using 
combined long-term follow-up and transcriptomic data of 
over 530 (n=530-1302) pancreatic carcinoma patients from 
seven patient cohorts (TCGA PAAD, ICGC_Array, GSE28735, 
ICGC_Seq, GSE62452, GSE71729, and EMTAB6134). OSpaad 
was used to generate Kaplan-Meier survival curves, calculate 
the hazard ratio (HR) with 95% confidence intervals, and log- 
rank p -value [13]. The ‘upper 50%’ option was selected as the 
cutoff to split the patient cohort. Log-rank p-value <0,05 was 
considered statistically significant. 


kinases 


Results 

GEO Datasets included in the integrative transcriptomic 
analysis 

A total of ten GEO gene expression microarray datasets 
including transcriptomic data from 271 PDAC tissues and 150 
healthy and adjacent-to-tumor non-tumor pancreatic tissue 
samples were found to be eligible for the analysis. Three 
data sets including data from 17 PDAC tissues and 18 control 
tissues did not pass the quality control and therefore were 
excluded from the analysis. Three samples from the remaining 
datasets were not included in the analysis due to poor quality. 
Consequently, transcriptomic data of 252 PDAC tissues and 
131 control samples from seven GEO datasets were included in 
the study. Healthy pancreatic tissues and adjacent non-tumor 
pancreatic tissues were used as a control group. The GEO 
datasets included in the integrative transcriptomic analysis are 
shown in Table 1. 
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Table 1. Datasets used in the meta-analysis. The table shows 
the GEO identifier of each dataset and the number of samples 
assigned to the control group and the PDAC group included in 
the meta-analysis. ‘Excluded samples’ shows the number of 
samples that were not included in the meta-analysis due to 
poor quality. 


Differentially Expressed Genes in PDAC Compared to Controls 
A total of 278 genes were found to be down-regulated and 
959 genes were found to be up-regulated in PDAC compared 
to control tissues (adjusted p-values0O,05, -2<z-values2). The 
heatmap of the identified top 100 differentially expressed 
genes in PDAC is shown in Figure 1. 


Differentially Expressed Protein kinase-coding Genes in PDAC 


Bxoees A total of forty protein kinase-coding genes were found to be 


Samples 


1) 


GEO Dataset PMID 


Controls (n) 


PDAC (n) 
significantly altered in PDAC. The most significantly altered 


esbesies 6 - aren PMID: protein kinase-coding genes in PDAC were found to belong to 
st the protein-serine / threonine kinases (EC:2.7.11., n=35). Five 
GSE15471 35 36 1 control AMD ; ‘3 F 
19260470 genes (TK2, JAK1, JAK3, PTK2, and SYK) coding protein-tyrosine 
Gsceoiés 42 447 Wesen PMID: kinases (EC:2.7.10) were found to be up-regulated in PDAC. 
27520560 
PMID: 
Eke 2 : g 23157946 External Validation of the Differential Gene Expression in 
GSE19280 3 4 0 so TCGA-PAAD Data 
The differential expression of the identified protein kinase- 
GSE41368 6 6 (0) al F = : : - 
24120476 coding genes (n=40) was further validated in TCGA-PAAD 
Piesaon ; ia 7 ve combined GTEx data to increase reliability. 


Table 2. The identified differentially expressed protein kinase-coding genes in PDAC. Only genes whose differential expression 
in PDAC were validated in TCGA-PAAD cohort are listed (p<0,01 and fold change=2). EC number indicates Enzyme Commission 
Number. P-val and FDR_p-val are P-values and adjusted P-values, respectively. Z-value is a measure of the differential expression. 
A positive z-value means that the gene is overexpressed, and a negative z-value means underexpression. 


Enzyme 


Sub-Subclass Ee Number, 


Z-value 


Gene Symbol 


Gene_Name 


EC:2.7.10.1 TK2 0,0044. 0,0015 3,2 thymidine kinase 2, mitochondrial 
EG2 7102 JAK1 0,021 0,0096 2,6 Janus kinase 1 
EC:2.7.10. Protein-Tyrosine . 
Kinases EC:2.7.10.2 JAK3 0,027 0,013 2,5 Janus kinase 3 
EG 2 ioe PTK2 0,00087 0,0002 =f protein tyrosine kinase 2 
EC:2.7.10.2 SYK 0,0015 0,00039 3,5 spleen associated tyrosine kinase 
EG2 aiid TAOK1 0,0004 0,000073 4 TAO kinase 1 
EC:2.7.11.1 MKNK1 3,20E-06 2,00E-07 -5,2 MAP kinase interacting serine/threonine kinase 1 
EG2aliind BMP2K 0,0045 0,0015 3,2 BMP2 inducible kinase 
EC:2.7.11.1 CSNK1A1 5,70E-07 2,60E-08 5,6 casein kinase 1 alpha 1 
EG 2eyaiiel EIF2AK2 0,00064 0,00014 3,8 eukaryotic translation initiation factor 2 alpha kinase 2 
EC:2.7.11.1 MAP4K4. 0,000011 9,40E-07 49 mitogen-activated protein kinase kinase kinase kinase 4 
EG? elie MLKL 0,000049 5,50E-06 45 mixed lineage kinase domain like pseudokinase 
EC:2.7.11.1 PAK1 7,60E-06 5,80E-07 5 p21 (RAC1) activated kinase 1 
EG? 7k RIPK3 0,0019 0,00054 VS) receptor interacting serine/threonine kinase 3 
EC:2.7.11.1 RPS6KA3. 0,027 0,013 2,5 ribosomal protein S6 kinase A3 
EG 7a SRPK2 0,00064 0,00014 3,8 SRSF protein kinase 2 
EC:2.7.11.1 STK10 0,00036 0,000066 4 serine/threonine kinase 10 
EG? 7k STK17B 0,00064 0,00013 3,8 serine/threonine kinase 17b 
Farah Si eal Theo ec.2.7.111 STK4 0,0001 0,000014 43 serine/threonine kinase 4 
EG 27a UHMK1 2,30E-06 1,40E-07 5,3 U2AF homology motif kinase 1 
EC:2.7.11.1 WNK1 2,20E-07 8,50E-09 5,8 WNK lysine deficient protein kinase 1 
EG? eyed AKT3 0,0027 0,00082 oye) AKT serine/threonine kinase 3 
EC:2.7.11.1 LATS2 0,00075 0,00017 3,8 large tumor suppressor kinase 2 
[Rep rsilailal PRKACB 0,00069 0,00015 3,8 protein kinase cAMP-activated catalytic subunit beta 
EC:2.7.11.13 PRKD3 0,029 0,014 2,5 protein kinase D3 
EG:257.11.18 MYLK 0,00029 0,000049 4,1 myosin light chain kinase 
EC:2.7.11.22 2.7.11.23 CDK12 0,00087 0,0002 3,7 cyclin dependent kinase 12 
EG 27a 22b2felilh 25) CDK19 0,0013 0,00032 3,6 cyclin dependent kinase 19 
EC:2.7.11.22 CDK6 0,0011 0,00028 3,6 cyclin dependent kinase 6 
EG:2 711-25, MAP3K2 0,00082 0,00019 al mitogen-activated protein kinase kinase kinase 2 
EC:2.7.11.25 MAP3K8 0,0041 0,0013 3,2 mitogen-activated protein kinase kinase kinase 8 
EG:2.7.11.26 GSK3B 0,037 0,019 DS glycogen synthase kinase 3 beta 
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The analysis resulted in the validation of the differential 
expression of 32 (80%) protein kinase-coding genes identified 
in PDAC (ps0,01). The identified and externally validated 
differentially expressed protein kinase-coding genes in PDAC 
are shown in Table 2. 

Prognostic Value of the Significantly altered Protein kinase- 
coding Genes in PDAC 

The prognostic significance of the identified and validated 
a total of 32 differentially expressed kinase-coding genes in 
PDAC was evaluated using integrated long-term follow-up and 
transcriptome data from over 530 (n=530-1302) patients with 
PDAC. 

Elevated mRNA expressions of 11 genes were found to be 
significantly associated with the overall survival rate of patients 
with PDAC (Logrank ps0,05). Amongst, two genes PTK2 
(protein tyrosine kinase 2) and SYK (spleen associated tyrosine 
kinase) belong to protein-tyrosine kinases sub-subclass. Others 
were MAP4K4 (mitogen-activated protein kinase kinase kinase 
kinase 4), TAOK1 (TAO kinase 1), CSNK1A1 (casein kinase 1 
alpha 1), EIF2AK2 (eukaryotic translation initiation factor 2 
alpha kinase 2), WNK1 (WNK lysine deficient protein kinase 
1), PRKACB (protein kinase cAMP-activated catalytic subunit 
beta), GSK3B (glycogen synthase kinase 3 beta), CDK12 
(cyclin- dependent kinase 12), and CDK6 (cyclin- dependent 
kinase 6) belong to the protein serine/threonine kinases sub- 
subclass of transferases. Overexpression of nine genes (PTK2, 
TAOK1, CSNK1A1, EIF2AK2, WNK1, CDK12, CDK6, GSK3B, and 
MAP4KA4) was found to be significantly correlated with worse 
overall survival of patients with PDAC (Logrank ps0,05, HR>1). 
Elevated gene expression of SYK and PRKACB were correlated 
with better overall survival (Logrank ps0,05, HR<1). The Kaplan- 
Meier survival curves for the identified prognostic genes are 
shown in Figure 2A-K. 


Discussion 

In this study, the differentially expressed protein kinase-coding 
genes in pancreatic ductal adenocarcinoma were determined 
using an integrative transcriptome meta-analysis approach. 
The identified genes were then externally validated in TCGA 
PAAD combined GTEx data involving data from PDAC samples 
and healthy pancreatic tissues to increase the reliability of 
the findings. Non-validated genes were excluded from further 
analysis. Eighty percent (n=32) of the identified differentially 
expressed genes were validated in TCGA combined GTEx 
cohort, suggesting that mRNA expression of these genes may 
have the potential to be diagnostic biomarkers for PDAC. Since 
protein-kinases are preferred targets for cancer therapies, 
these dysregulated genes also merit further study as potentially 
promising candidates for the development of more effective 
treatment strategies for PDAC. 

The identified protein kinase-coding genes were further 
evaluated in terms of their prognostic significance in PDAC. 
Among 32 differentially expressed protein kinase-coding 
genes, over-expression of nine genes (PTK2, TAOK1, CSNK1A1, 
EIF2AK2, WNK1, CDK12, CDK6, GSK3B, and MAP4K4) were 
found to be significantly associated with worse overall survival. 
Furthermore, overexpression of SYK and PRKACB correlated 
with better overall survival in the combined seven patient 


Figure 1. Gene expression heatmap of top 100 differentially 
expressed genes identified in PDAC compared to the controls. 
Green indicates down-regulated mRNA levels and red elevated 
levels (adjusted p-value <0,05). Neon green and purple strips 
mark control samples and PDAC tissue samples, respectively. 


cohorts in OSpaad. Among the identified prognostic genes, 
a significant relationship between overexpression of eight 
genes (PTK2, TAOK1, CSNK1A1, EIF2AK2, WNK1, CDK12, 
CDK6, and PRKACB) and overall survival rate of patients with 
PDAC has not yet been reported. The knowledge about the 
functional significance of these identified dysregulated genes 
in the pathogenesis of PDAC is limited and deserves further 
investigation. 

In this study, MAP4K4 was the gene most significantly 
associated with the overall survival of patients with PDAC. 
MAP4K4, belongs to the mammalian STE20/MAP4K family, is 
involved in important cellular processes such as migration and 
proliferation [14,15]. Elevated protein expressions of MAP4K4 
have been associated with worse prognosis in pancreatic 
ductal adenocarcinoma before [16]. However, the prognostic 
significance of MAP4K4 has been evaluated in only stage II 
PDAC [16]. In this study, overexpression of MAP4K4 was found 
to be significantly associated with a decreased overall survival 
rate of patients with PDAC. Therefore, the results of the 
presented study indicate that MAP4K4 is overexpressed in PDAC 
compared to control tissues, and elevated mRNA expression of 
MAPK4K may have the potential to be a prognostic biomarker 
for PDAC, inspiring further clinical investigation. 

Another gene, which was found to be overexpressed and 
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Figure 2. The identified differentially expressed protein kinase-coding genes whose mRNA levels significantly associated 
with overall survival of patients with PDAC. Kaplan Meier survival plots were created based on the TCGA PAAD dataset and 
ranked in ascending order by p-value separately for genes associated with good prognosis and poor prognosis. A-| shows Kaplan 
Meier survival curves for genes whose high expression is associated with worse overall survival. ) and K show Kaplan Meier survival 
curves for the identified genes whose mRNA expression correlated with better overall survival. PsO,05 was accepted statistically 


significant. 


associated with poor overall survival was GSK3B. GSK3B plays 
important role in the regulation of cell cycle, transcription, 
proliferation, differentiation, and apoptosis. Overexpression 
and activity of GSK3B have been reported in pancreatic cancer 
cells compared to non-neoplastic cells [17]. Moreover, it has 
been shown that inhibition of GSK3B significantly reduced 
proliferation and survival of cancer cells, sensitized them 
to gemcitabine and ionizing radiation, and attenuated their 
migration and invasion in vitro [17,18]. Furthermore, GSK3B 
gene expression has been shown to be increased by Ras, which 
is frequently (>90%) mutated in pancreatic cancer, through 
Raf/MEK/ERK signaling [19]. GSK3B promotes constitutive 
NF-kB signaling, which is an important pathway in cancer cell 
survival, growth and responses to chemotherapeutic agents 
[20]. In accordance, it has been reported that inhibition of 
GSK3B arrests pancreatic tumor growth in vivo and decreases 
NF-kappaB-mediated pancreatic cancer cell survival and 
proliferation in established tumor xenografts [21]. However, the 
higher protein level of GSK3B has been shown to be associated 


with a better survival rate of patients with pancreatic cancer 
(n=163) [22]. In this study, GSK3B was found to be up-regulated 
in PDAC compared to controls including adjacent to tumor 
non-cancerous tissues and normal pancreatic tissues (n=252 
tumor vs. n=131 controls), and the identified high expression 
of GSK3B in PDAC was validated in an external cohort including 
179 pancreatic cancer patients and healthy controls (n=171). 
In addition, the correlation between the identified high mRNA 
expression of GSK3B and overall survival of patients was 
evaluated in a cohort including a total of 1301 patients with 
PDAC. It was found that high tumoral mRNA expression of 
GSK3B is significantly associated with lower overall survival 
rate of patients with PDAC. Overall, these indicate that the 
mRNA level of GSK3B has the potential to be both diagnostic 
and prognostic biomarker for PDAC. However, the correlation 
between mRNA and protein levels of GSK3B as well as the 
potential difference between their prognostic values in PDAC 
should be investigated in further studies. 

Furthermore, the results of the presented study were in 
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accordance with the results of previous bioinformatics study 
reporting that overexpression of SYK in pancreatic cancer 
patients is correlated with better overall survival, which 
highlights the prognostic value of SYK for PDAC [23]. 
Conclusion 

Overall, the results of this study revealed protein kinase-coding 
genes whose altered mRNA expression levels can serve as both 
diagnostic and prognostic biomarkers for PDAC. Additional 
studies are necessary to validate the suggested diagnostic and 
prognostic significance of differentially expressed genes in 
pancreatic ductal adenocarcinoma. 
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